Knee Replacement Surgery Outcome

Introduction

This research aims to determine if the postoperative function of a total knee replacement recipient can be modeled based on anthropomorphic data and details of the surgical procedure. We decided to carry out testing on models with two postoperative scoring systems as the response variable. The first score is the “Surgeon’s Score” which measures more objective postoperative metrics about the condition of the replaced knee following surgery. These metrics include: how straight or how flexed the patient can make thier knee, how well aligned the knee appears when inspecting from a frontal perspective and how much laxity or separation can be achieved by placing the lower leg under tension. Finally, the patient is asked about the overall pain they experience with the knee. The resulting score is a value between 0-100 with 80-100 being an excellent result.

The second score we model is the “Patient’s Score” which is generated by three questions: how far are they able to walk, how well can they navigate stairs and do they require any walking aids (crutches, cane, etc).

We hope to create a model that can predict each of the scores based on a variety of categorical and numeric predictors.

Background information

This data has been collected from a Southern California Hospital in collaboration with the Shiley Center for Orthopedic Research. The data is used for both clinical research and postoperative monitoring of implant survivorship. All patient specific information has been removed from the data so it adheres to the Health Insurance Portability and Accountability Act (HIPAA).

Statement

We hope to determine the impact a particular surgeon or the type of implants used has on the patient’s outcome after surgery. We are also interested in exploring the potential relationship between anthropomorphic factors and the patient’s postoperative satisfaction. Finally, we would like to understand if the considerable expense of computer navigation in knee replacement leads to better postoperative scores. Nevertheless, we keep our option opened to investigate other aspects that were not ‘visibile’ or never thought about it but might reveal themselfs as we move on with the project.

Here we may consider KneeScore_Surgeon or KneeScore_Patient as response and Age, Gendar, Weight, Height, BMI, Race, Side, Manufacturer as predictors. Gendar, Race, Side and Manufacturer are some of the categorical predictors and Age, Height, Weight and BMI are numerical predictors.

knee = read.csv('knees.csv')
knee$Surgeon = as.factor(knee$Surgeon)
knee$Year = as.factor(knee$Year)

Functions for model evaluation

#Function to plot various plots as QQPlot, Fitted vs Residual Plot.
plots = function(model) {
  par(mfcol=c(2,2))
  qqnorm(resid(model), col = red, pch = 16)
  qqline(resid(model), col = lightblue, lwd = 2)
  plot(fitted(model), resid(model), xlab = 'Fitted Values', ylab = 'Residuals', main = 'Fitted vs. Residuals Plot', col = purple, pch = 16)
  abline(h = 0, col = green, lwd = 2)
  hist(resid(model), col = blue, main = 'Histogram of Residuals', xlab = 'Residuals')
}

#Function to calculate various statistics which we use to evaluate the model
evaluate = function(model){
  # Run each of the tests
  lambda = 1.000001 # this is a hack so the function will return a CrossValidation score - HatValues of 1.0 are causing a division by zero error.
  shapiroTest = shapiro.test(resid(model))
  bpTest = bptest(model)
  rSquared = summary(model)$r.squared
  adjRSquared = summary(model)$adj.r.squared
  loocv = sqrt(mean((resid(model) / (lambda - hatvalues(model))) ^ 2))
  large_hat = sum(hatvalues(model) > 2 * mean(hatvalues(model)))
  large_resid = sum(rstandard(model)[abs(rstandard(model)) > 2]) / (length(rstandard(model) + lambda))
  
  
  # Collect tests in dataframe
  values = data.frame(Result = c(prettyNum(shapiroTest$p.value), prettyNum(bpTest$p.value), prettyNum(rSquared), prettyNum(adjRSquared),prettyNum(loocv),               prettyNum(large_hat))) 
  rowNames = data.frame(Test = c('Shapiro Wilk pvalue', 'Breusch-Pagan pvalue', 'R Squared', 'Adj R Squared', 'Cross Validation', 'Large Hat Values'))

  summary_output = cbind(rowNames, values)
  show(summary_output)
  plots(model)
}
# find and remove all influential values and return the resulting dataframe
removeInfluential = function(model, data){
  large_lev_index = which(hatvalues(model) > 2 * mean(hatvalues(model)))
  cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
  rStandard = abs(rstandard(model))
  rStandardIndex = which(rStandard > 2)
  index_ = intersect(intersect(large_lev_index, cooks), rStandardIndex)
  newData = data[-index_ , ]
  return(newData)
}

# identify all influential data points
IdentifyInfluential = function(model, data){
  large_lev_index = which(hatvalues(model) > 2 * mean(hatvalues(model)))
  cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
  rStandard = abs(rstandard(model))
  rStandardIndex = which(rStandard > 2)
  index_ = intersect(intersect(large_lev_index, cooks), rStandardIndex)
  return(index_)
}

# find large hat values, subtract them from the dataframe and return the dataframe
removeLargeHatValues = function(model, data){
  largeHat = which(hatvalues(model) > 2 * mean(hatvalues(model)))
  newData = data[-largeHat, ]
  return(newData)
}
# calculate the prediction accuracy with a margin (i.e. if the margin is set to 2, then a predicted value of 35.5 will return true if the actual value is 33.5-37.5)
prediction_accuracy = function(model, data, margin) {
  upper = data + margin
  lower = data - margin
  prediction = predict(model, newdata = data)
  return(sum(prediction < upper & lower < prediction)) / length(prediction)
}

rmse  = function(actual, predicted) {  sqrt(mean((actual - predicted) ^ 2))}

Knee Score Surgeon

Begin by fitting a full additive model as a baseline and evaluating.

surgeon_full_additive_model = lm(KneeScore_Surgeon ~ ., data = knee)
evaluate(surgeon_full_additive_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 2.080142e-08
## 2 Breusch-Pagan pvalue    0.9192514
## 3            R Squared    0.2718411
## 4        Adj R Squared    0.2130606
## 5     Cross Validation      14.4126
## 6     Large Hat Values          177

The baseline model has many large hat values. Let’s try to understand why.

largeHat = which(hatvalues(surgeon_full_additive_model) > (2 * mean(hatvalues(surgeon_full_additive_model))))
unique(largeHat)
##   [1]    7    9   10   17   19   28   30   40   52   74   78   80   87   91
##  [15]  110  118  121  123  125  149  154  161  198  233  234  235  236  237
##  [29]  238  240  243  244  257  258  280  302  343  347  352  354  369  371
##  [43]  372  397  400  404  427  446  461  473  482  505  527  529  530  531
##  [57]  532  533  534  535  544  547  552  554  559  573  638  679  681  689
##  [71]  700  703  763  785  818  844  849  876  881  883  885  888  891  933
##  [85]  952  976  979  989  996 1030 1032 1047 1055 1057 1060 1063 1064 1065
##  [99] 1066 1067 1077 1093 1096 1099 1104 1107 1117 1159 1161 1178 1182 1202
## [113] 1213 1220 1241 1245 1246 1252 1257 1273 1284 1352 1370 1374 1376 1379
## [127] 1385 1386 1392 1406 1415 1431 1450 1466 1467 1488 1500 1502 1503 1515
## [141] 1517 1521 1522 1523 1540 1541 1544 1554 1568 1572 1587 1601 1617 1686
## [155] 1689 1692 1694 1700 1746 1762 1771 1773 1774 1775 1778 1781 1783 1784
## [169] 1785 1863 1864 1865 1914 1919 1927 1938 1939
print(knee[689, ])
##     Surgeon Year Age Gender Weight Height   BMI      Diagnosis
## 689       4 2012  68 Female    162     63 28.72 Osteoarthritis
##                 Race  Side KneeScore_Surgeon KneeScore_Patient GPS
## 689 Mid-East Arabian Right                50                50  No
##     Manufacturer FemoralComponentModel FemoralComponentSize
## 689      Stryker             Triathlon                    4
##     FemoralComponentType TibialTrayModel TibialTraySize TibialInsertModel
## 689   cruciate retaining       Triathlon              4         Triathlon
##     TibialInsertWidth TibialInsertType                 PatellaModel
## 689                11        high flex Triathlon Asymmetric Patella
##     PatellaDiameter
## 689              39
print(knee[1927, ])
##      Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race
## 1927       1 2015  62   Male    200     68 30.44 Osteoarthritis White
##           Side KneeScore_Surgeon KneeScore_Patient GPS   Manufacturer
## 1927 Bilateral                42                50  No Smith & Nephew
##      FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1927                Legion                   8+   cruciate retaining
##      TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1927      Genesis II              7    Legion CR XLPE                13
##      TibialInsertType PatellaModel PatellaDiameter
## 1927        high flex   Genesis II              38

After inspection of the rows that represent the large hat values it seems that there are a preponderence of values that are either errant entries or valid, but rare entries. For example, the Race predictor has only two entries for Mid-East Arabian and the FemoralComponentSize entry of ‘8+’ is errant. First we will attempt to remove the predictors that seem to be prone to errant or rare values and later we will remove the high leverage values directly.

We attempted several model iterations with tranformations of both the response and predictors as well as large all two-way interaction models none of which proved helpful in predicting the response. The models with many interactions took both substatantial time and computing resources to fit and were suspect of overfitting. We pick up here with model eight which was the first of the promising models. The interested reader can find earlier attempted model in the appendix.

surgeon_model_eight = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(surgeon_model_eight)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.989833e-07
## 2 Breusch-Pagan pvalue    0.1270351
## 3            R Squared    0.2374036
## 4        Adj R Squared    0.2084443
## 5     Cross Validation     14.34844
## 6     Large Hat Values          126

anova(surgeon_model_eight, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ (Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType - 
##     FemoralComponentModel - PatellaModel - TibialTrayModel - 
##     TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1896 379610                           
## 2   1821 362468 75     17142 1.1483 0.1845

While removing the predictors doesn’t yeild a definatively better model at a reasonable \(\alpha\), it does increase Adjusted \(R^2\) and slightly improves the cross validation score.

Utilize a Bayesian Information Criterion approach and evaluate the selected model.

n = length(resid(surgeon_full_additive_model))
surgeon_model_back_bic = step(surgeon_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(surgeon_model_back_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.169168e-05
## 2 Breusch-Pagan pvalue 0.0001309477
## 3            R Squared    0.1869122
## 4        Adj R Squared    0.1852562
## 5     Cross Validation     14.37628
## 6     Large Hat Values           87
anova(surgeon_model_back_bic, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS  Df Sum of Sq      F   Pr(>F)    
## 1   1964 404744                                  
## 2   1821 362468 143     42276 1.4853 0.000288 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.

Next, we will evaluate the RMSE of the best performing models thus far.

options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]


full_test = rmse(test_data$KneeScore_Surgeon, predict(surgeon_full_additive_model, test_data))
full_train = rmse(train_data$KneeScore_Surgeon, predict(surgeon_full_additive_model, train_data))

test_mod_8 = rmse(test_data$KneeScore_Surgeon, predict(surgeon_model_eight, test_data))
train_mod_8 = rmse(train_data$KneeScore_Surgeon, predict(surgeon_model_eight, train_data))

bic_test = rmse(test_data$KneeScore_Surgeon, predict(surgeon_model_back_bic, test_data))
bic_train = rmse(train_data$KneeScore_Surgeon, predict(surgeon_model_back_bic, train_data))
Model Train RMSE Test RMSE
Model All+ 13.6920687 13.1627856
Model 8 13.9816352 13.5713295
Model BIC 14.40996 14.1023973

The full model delivers the best RMSE as well as

Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.

new_dataframe = removeInfluential(surgeon_model_eight, knee)
surgeon_model_nine = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(surgeon_model_nine)
##                   Test       Result
## 1  Shapiro Wilk pvalue 6.718107e-07
## 2 Breusch-Pagan pvalue    0.2390135
## 3            R Squared     0.241005
## 4        Adj R Squared    0.2124774
## 5     Cross Validation     14.17558
## 6     Large Hat Values          119

This does seem to be of moderate benefit in relation to Adjusted \(R^2\) and Cross Validation.

We will evaluate the same model one again after removing the identified large hat values.

remove_hat_dataframe = removeLargeHatValues(surgeon_model_nine, new_dataframe)
surgeon_model_ten = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(surgeon_model_ten)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.604033e-06
## 2 Breusch-Pagan pvalue   0.08198251
## 3            R Squared    0.2187277
## 4        Adj R Squared    0.2013757
## 5     Cross Validation     14.32124
## 6     Large Hat Values           93

Removing the large hat values decreased the Cross Validation score. Perhaps we can find a better model by running BIC on the modified dataframe.

n = length(resid(surgeon_model_ten))
surgeon_model_10_bic = step(surgeon_model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(surgeon_model_10_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 0.0001065612
## 2 Breusch-Pagan pvalue 8.666681e-05
## 3            R Squared    0.1862472
## 4        Adj R Squared    0.1844753
## 5     Cross Validation     14.33378
## 6     Large Hat Values           79

Despite the reduction in leveraged values the selected model is no better than preceeding models.

sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]

test_mod_10 = rmse(test_data_two$KneeScore_Surgeon, predict(surgeon_model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Surgeon, predict(surgeon_model_ten, train_data_two))

test_10_bic = rmse(test_data_two$KneeScore_Surgeon, predict(surgeon_model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Surgeon, predict(surgeon_model_10_bic, train_data_two))
Model Train RMSE Test RMSE
Model 10 14.0424519 17.1835625
Model 10 BIC 14.313086 17.0115541

Removing the large hat values has had the opposite intended effect of decreasing the RMSE values.

Knee Score Patient

Begin by fitting a full additive model as a baseline and evaluating.

pat_full_additive_model = lm(KneeScore_Patient ~ ., data = knee)
evaluate(pat_full_additive_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue    0.9539336
## 3            R Squared    0.3161338
## 4        Adj R Squared    0.2609288
## 5     Cross Validation     17.07365
## 6     Large Hat Values          178

The baseline model has many large leverage values, which, upon inspection may be coming from several of the implant fields. Perhaps removing the suspect implant predictors will help improve the model.

pat_model_eight = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(pat_model_eight)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.115521e-14
## 2 Breusch-Pagan pvalue   0.00453633
## 3            R Squared    0.2810507
## 4        Adj R Squared    0.2537489
## 5     Cross Validation     16.88212
## 6     Large Hat Values          126

anova(pat_model_eight, pat_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Patient ~ (Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType - 
##     FemoralComponentModel - PatellaModel - TibialTrayModel - 
##     TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1   1896 530943                              
## 2   1821 505034 75     25909 1.2456 0.07874 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

While removing the predictors doesn’t yeild a definatively better model at a reasonable \(\alpha\), it does increase Adjusted \(R^2\) and slightly improves the cross validation score.

Utilize a Bayesian Information Criterion approach and evaluate the selected model.

n = length(resid(pat_full_additive_model))
pat_model_back_bic = step(pat_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_back_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3            R Squared    0.2423439
## 4        Adj R Squared    0.2404141
## 5     Cross Validation     16.91176
## 6     Large Hat Values          112
anova(pat_model_back_bic, pat_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Patient ~ Age + Gender + Weight + Height + KneeScore_Surgeon
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS  Df Sum of Sq      F   Pr(>F)   
## 1   1963 559528                                 
## 2   1821 505034 142     54494 1.3837 0.002592 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.

Next, we will evaluate the RMSE of the best performing models thus far.

options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]

rmse  = function(actual, predicted) {  sqrt(mean((actual - predicted) ^ 2))}

pat_full_test = rmse(test_data$KneeScore_Patient, predict(pat_full_additive_model, test_data))
pat_full_train = rmse(train_data$KneeScore_Patient, predict(pat_full_additive_model, train_data))

pat_test_mod_8 = rmse(test_data$KneeScore_Patient, predict(pat_model_eight, test_data))
pat_train_mod_8 = rmse(train_data$KneeScore_Patient, predict(pat_model_eight, train_data))

pat_bic_test = rmse(test_data$KneeScore_Patient, predict(pat_model_back_bic, test_data))
pat_bic_train = rmse(train_data$KneeScore_Patient, predict(pat_model_back_bic, train_data))
Model Train RMSE Test RMSE
Model All+ 13.6920687 13.1627856
Model 8 13.9816352 13.5713295
Model BIC 14.40996 14.1023973

The additive model with all predictors has the best RMSE values.

Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.

new_dataframe = removeInfluential(pat_model_eight, knee)
pat_model_nine = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(pat_model_nine)
##                   Test       Result
## 1  Shapiro Wilk pvalue 2.138818e-14
## 2 Breusch-Pagan pvalue  0.006017515
## 3            R Squared    0.2818728
## 4        Adj R Squared      0.25453
## 5     Cross Validation     16.75786
## 6     Large Hat Values          122

This does seem to be of moderate benefit in relation to Adjusted \(R^2\) and Cross Validation.

We will evaluate the same model one again after removing the identified large hat values.

remove_hat_dataframe = removeLargeHatValues(pat_model_nine, new_dataframe)
pat_model_ten = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(pat_model_ten)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.502033e-13
## 2 Breusch-Pagan pvalue 0.0001416128
## 3            R Squared    0.2753799
## 4        Adj R Squared    0.2592862
## 5     Cross Validation     16.92845
## 6     Large Hat Values           94

Removing the large hat values decreased the Cross Validation score. Perhaps we can find a better model by running BIC on the modified dataframe.

n = length(resid(pat_model_ten))
pat_model_10_bic = step(pat_model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_10_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.298166e-14
## 2 Breusch-Pagan pvalue 1.347535e-06
## 3            R Squared    0.2491407
## 4        Adj R Squared    0.2470959
## 5     Cross Validation     16.91151
## 6     Large Hat Values          101

Despite the reduction in leveraged values the selected model is no better than preceeding models.

sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]

test_mod_10 = rmse(test_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))

test_10_bic = rmse(test_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
Model Train RMSE Test RMSE
Model 10 16.6282358 21.7155956
Model 10 BIC 16.8517414 21.4036662

Interestingly the models fit on the dataset without leverages fares worse on test RMSE.

Appendix

Knee Score Patient

pat_model = lm(KneeScore_Patient ~. , data = knee)
pat_model_back_aic = step(pat_model, direction = "backward", trace = 0)
pat_model_for_aic = step(pat_model, direction = "forward", trace = 0)
pat_model_both_aic = step(pat_model, direction = "both", trace = 0)

n = length(resid(pat_model))
pat_model_back_bic = step(pat_model, direction = "backward", k = log(n), trace = 0)
pat_model_for_bic = step(pat_model, direction = "forward", k = log(n), trace = 0)
pat_model_both_bic = step(pat_model, direction = "both", k = log(n), trace = 0)
evaluate(pat_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue    0.9539336
## 3            R Squared    0.3161338
## 4        Adj R Squared    0.2609288
## 5     Cross Validation     17.07365
## 6     Large Hat Values          178

evaluate(pat_model_back_aic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.123084e-14
## 2 Breusch-Pagan pvalue 2.237347e-05
## 3            R Squared    0.2562632
## 4        Adj R Squared    0.2490128
## 5     Cross Validation     16.85518
## 6     Large Hat Values          147

evaluate(pat_model_for_aic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue    0.9539336
## 3            R Squared    0.3161338
## 4        Adj R Squared    0.2609288
## 5     Cross Validation     17.07365
## 6     Large Hat Values          178

evaluate(pat_model_both_aic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.123084e-14
## 2 Breusch-Pagan pvalue 2.237347e-05
## 3            R Squared    0.2562632
## 4        Adj R Squared    0.2490128
## 5     Cross Validation     16.85518
## 6     Large Hat Values          147

## Adj.R.Squared: 0.24 - CrossValidation: 16.9
evaluate(pat_model_back_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3            R Squared    0.2423439
## 4        Adj R Squared    0.2404141
## 5     Cross Validation     16.91176
## 6     Large Hat Values          112

evaluate(pat_model_for_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue    0.9539336
## 3            R Squared    0.3161338
## 4        Adj R Squared    0.2609288
## 5     Cross Validation     17.07365
## 6     Large Hat Values          178

evaluate(pat_model_both_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3            R Squared    0.2423439
## 4        Adj R Squared    0.2404141
## 5     Cross Validation     16.91176
## 6     Large Hat Values          112

## Remove large leverages
new_data = removeInfluential(pat_model, knee)
pat_model_without_large_leverages = lm(KneeScore_Patient ~ ., data = new_data)
evaluate(pat_model_without_large_leverages)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.061121e-14
## 2 Breusch-Pagan pvalue    0.9760654
## 3            R Squared    0.3171425
## 4        Adj R Squared    0.2622432
## 5     Cross Validation     16.88624
## 6     Large Hat Values          174

Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?

which(hatvalues(pat_model) == 1.0)
##   17   19   74   80  240  244  258  354  400  533  638  703  763  844  876 
##   17   19   74   80  240  244  258  354  400  533  638  703  763  844  876 
##  881  891 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927 
##  881  891 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927 
## 1938 
## 1938
print(knee[17,])
##    Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race Side
## 17       3 2010  68 Female    148     64 25.43 Osteoarthritis White Left
##    KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 17                48                50  No  DePuy (J&J)
##    FemoralComponentModel FemoralComponentSize FemoralComponentType
## 17             Nonporous                    2   cruciate retaining
##    TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 17       PFC Sigma            2.5         PFC Sigma                 8
##    TibialInsertType      PatellaModel PatellaDiameter
## 17           curved 3-Post Round Dome              32
print(knee[240,])
##     Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race
## 240       3 2010  66 Female    130     68 19.78 Osteoarthritis White
##          Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 240 Bilateral                60                80  No       Zimmer
##     FemoralComponentModel FemoralComponentSize FemoralComponentType
## 240        Zimmer Precoat                  E -   cruciate retaining
##     TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 240          Nexgen              4            Nexgen                10
##     TibialInsertType PatellaModel PatellaDiameter
## 240         standard       NexGen              32
print(knee[638,])
##     Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race  Side
## 638       3 2012  67 Female    167     62 30.57 Osteoarthritis White Right
##     KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 638                55                55 Yes  DePuy (J&J)
##     FemoralComponentModel FemoralComponentSize FemoralComponentType
## 638                 Sigma                   4N   cruciate retaining
##     TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 638       PFC Sigma              3         PFC Sigma                 8
##            TibialInsertType                 PatellaModel PatellaDiameter
## 638 high flex,Fixed bearing Triathlon Asymmetric Patella              35
print(knee[1093,])
##      Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race
## 1093       3 2013  83 Female    130     63 23.05 Osteoarthritis White
##       Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1093 Right                66                50  No  DePuy (J&J)
##      FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1093                 Sigma                    3 posterior stabilized
##      TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1093       PFC Sigma            2.5         PFC Sigma                 8
##                 TibialInsertType            PatellaModel PatellaDiameter
## 1093 posterior stabilized,curved Oval Dome Patella 3-peg              35
print(knee[1775, ])
##      Surgeon Year Age Gender Weight Height   BMI   Diagnosis  Race  Side
## 1775       3 2014  22   Male    240     77 28.49 Post-trauma White Right
##      KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1775                32                65  No       Zimmer
##      FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1775               Persona                   12   cruciate retaining
##      TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1775         Persona              H           Persona                10
##      TibialInsertType PatellaModel PatellaDiameter
## 1775    Fixed bearing      Persona              38

Each of these has a suspect factor entry. Perhaps the entry only occurs infrequently in the database and is causing the entire row to be flagged as high leverage? Try pulling the suspect columns out of the LM and see what happens.

pat_model_nine = lm(KneeScore_Patient ~ . - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - FemoralComponentType, data = knee)
evaluate(pat_model_nine)
##                   Test      Result
## 1  Shapiro Wilk pvalue 2.75403e-14
## 2 Breusch-Pagan pvalue  0.08531339
## 3            R Squared   0.2986236
## 4        Adj R Squared    0.259491
## 5     Cross Validation    16.96213
## 6     Large Hat Values         137